We have information about the customers’ ZIP code. This information could be used, with public available information from sources like INEGI, to know the socioeconomic level of each savings customer.
Available sources:
AGEB stands for Área GeoEstadística Básica (Basic Geostatistical Area), and a locality is a general term used by CONAPO to define several AGEBs.
This document uses information from the socioeconomic regions defined by INEGI.
ZIP code geographical information is available. According to the official postal code webpage, there are 32,448 different ZIP codes in Mexico, from which 14,871 are available as shape files.
The polygons defining the ZIP codes aren’t equivalent to the polygons defining the AGEBs, so a mapping between them is needed to be able to use the public available information.
Perhaps the simplest solution is to find the centroid of each ZIP code and AGEB, and then just map a given ZIP code to the closest AGEB centroid.
We have a classification for each AGEB that pretends to show the differences among AGEBs based on indicators related with housing, education, health and employment, built from the last population census. Each AGEB can be classified in 7 strata such that stratum 7 contains AGEBs with the most favorable average conditions, and in stratum 1 are the AGEBs with the least favorable average conditions.
In the next images, maps of Mexico City and surroundings, Monterrey and Guadalajara are shown.
Map with centroids of each polygon:
Now, same map for Guadalajara, Jalisco:
And finally, for Monterrey, Nuevo León:
ZIP code information with their centroids can be seen in the next map of Mexico City:
ZIP code information with their centroids can be seen in the next map of Guadalajara. Some of the centroids may not match perfectly the polygon plotted because the database considers a the ZIP code and the identifier as a different group.
ZIP code information with their centroids can be seen in the next map of Monterrey:
Finally, plotting the centroids of AGEBs and ZIP codes in Mexico City altogether we get:
Guadalajara:
Monterrey:
So, for each available ZIP code, the closest AGEB centroid is found and a mapping is made to assign an AGEB to each ZIP code, such that we get a table in the following format:
| ZIP | ZIP long | ZIP lat | Nearest AGEB | AGEB long | AGEB lat | Distance in Km | Classification |
|---|---|---|---|---|---|---|---|
| 1000 | -99.19328 | 19.34674 | 0901000011129 | -99.19294 | 19.34740 | 0.0813726 | 7 |
| 1010 | -99.19391 | 19.36064 | 0901000010972 | -99.19487 | 19.36071 | 0.1007520 | 7 |
| 1020 | -99.18719 | 19.35735 | 0901000010987 | -99.18858 | 19.36013 | 0.3419371 | 7 |
| 1030 | -99.17933 | 19.35734 | 0901000011063 | -99.17872 | 19.35448 | 0.3237404 | 7 |
| 1040 | -99.19279 | 19.35596 | 0901000011044 | -99.19429 | 19.35471 | 0.2092340 | 7 |
| 1048 | -99.20502 | 19.36202 | 090100001092A | -99.20353 | 19.36357 | 0.2330690 | 6 |
| 1049 | -99.19715 | 19.35357 | 0901000011044 | -99.19429 | 19.35471 | 0.3255840 | 7 |
| 1050 | -99.18253 | 19.34970 | 0901000011133 | -99.18462 | 19.34641 | 0.4264119 | 7 |
| 1060 | -99.19831 | 19.34950 | 0901000011114 | -99.19524 | 19.35028 | 0.3327968 | 7 |
| 1070 | -99.18654 | 19.34449 | 0901000011133 | -99.18462 | 19.34641 | 0.2946388 | 7 |
This approach may fail since, as one can see, ZIP code polygons are generally bigger in area than AGEBs, so the heterogeneity of each ZIP code is being ignored.
First, let’s see what’s the distribution of the classification of AGEBs in the country. Remember that 7 is that the AGEB is “good” in average and that 1 is that it’s “bad”.
And now, the mapping of the ZIP codes:
The distribution changed drastically. As we can see in the following graph, originally the AGEBs were urban (U) and rural (R), but the mapping consists of only urban ZIP codes; so this may be a reason of why the distribution changed so much.
And now let’s analyze the sample with 1 million savings customers.
## Warning in eval(substitute(expr), envir, enclos): NAs introduced by
## coercion
## Joining by: "CP"
Out of the 1 million people, we have the mapping ZIP code for ‘r sum(datos\(zip_code %in% mapeo\)CP)’ of them, which are distributed the following way:
## Warning in eval(substitute(expr), envir, enclos): NAs introduced by
## coercion
## Joining by: "CP"